Data Model Definition: an abstraction of real world entities and their relationships into structures that can be implemented with a computer language.
OUTLINE
I. Introduction
II. Data models
III. Data Management System
IV. Data Models for Spatial Data
I. Introduction
- direct: Rapid data retrieval according to key attribute (i.e. dictionary spelling. Key attribute + additional information). In direct files the data items themselves provides means of ordering (soil series name with index to location of each name beginning with a particular letter)
- indirect: May have ordered soil profiles but may want info on soil depth, drainage, ph, texture or erosion. If the poorly drained soils need to be identified we must use a linear search unless we invert the file. Inverted files are initially ordered using a linear search. An example of an inverted file is a topic index in a book.
These data structures provide very efficient access to information pertaining to a single entity. But we need more. We need to relate different entities.
II. Data Models (Laurini and Thompson, 1992 and ANSI/X3/SPARC, 1978)
As data management became more complex a framework was need to understand the transformation of real world systems and processes into structures that could be implemented in a computer.
1. External model: provide the basis for understanding the real world (e.g. non-spatial: a set entities; spatial: the world as a constantly varying surface; the world as a discrete set of objects in space or as a set of thematic layers)
2. Conceptual data model: provide the organizing principles that translates the external data models into functional descriptions of how data objects are related to one another (e.g. non-spatial: E-R model; spatial: raster, vector, object representation).
3. Logical data model: provide the explicit forms that the conceptual models can take and is the first step in computing (e.g. non-spatial: hierarchical, network, relational; spatial: 2-d matrix, map file, location list, point dictionary, arc/nodes).
4. Internal data model: low level data structures, records, pointers, etc.
III. Data Base Management Systems (DBMS)
1- Definition: Data Base Management Systems: A system used to organize, access, maintain and manipulate object or entity data. A DBMS controls input, output, storage and retrieval of entity data. Essential features of a data base are fast access and cross referencing of entities.
2- Requirements for a DBMS
A DBMS should provide:
- Data Independence: the data base can change with little or no impact on the user programs
- Data Sharing: must have coordinated simultaneous access. Concurrency control mechanism.
- Maintenance of Data Integrity: DBMS helps enforce certain consistency constraints (i.e. coordinate has both lat and long, # of seats sold on an airplane <= # seats on plane)
- Security: DBMS provides mechanism for security/authorization from disclosure/destruction of data.
- Centrality of Control: DB administrator to resolve conflicts and meet user requirements
- Reduce Application Development Time
3- E-R data model: a conceptual data model in which information is represented by entities and relationships between entities
4- Terminology
a. entity - a distinguishable object in the real world (people, forest stand, watershed, ...etc.)
b. relationship - a correspondence or association between two or more entities.
c. attributes - the properties which describe an entity.
d. functionality - how many entities from one entity set can be associated with another set
e. primary key - main key for entity identification, one record per indexed attribute.
f. secondary key - may have multiple record occurrences per index attribute.
5- Hierarchical data model - one to many relationship.
+ easy to update and expand.
+ easy data access for keys.
+ ideal for data that is inherently hierarchical.
- poor access for associated attributes.
- Restrictive paths.
- one to many relationships
6- Network data model - many to many relationships.
+ reduces redundancy.
+ more flexible paths to data.
+ very fast
- pointers expensive and difficult to update when inserting and deleting.
7- Relational data model: data stored as records known as tuples grouped together in two-dimensional tables known as relations. Whereas hierarchical structures rely on the hierarchy and networks depend on pointers to associate entities, the relational model uses data redundancy in the form of unique keys that identify records in each file. Simplifies data maintenance because data for an entity type is stored in simple tables. Relational joins are used to cross reference entities using a primary key in one table and a foreign key in another table. Thus, in order to perform relational joins there needs to be at least one column in common between tables being related.
The relational model is design to reduce redundancy of data whenever possible. A set of rules called the normal forms were developed by Codd (1970) to guide this process.
+ structures very flexible.
+ boolean logic and math operations.
+ insert and delete easy.
- often use sequential search unless previously sorted.
8- Integration of DBMS with the spatial data models
IV. Data Models for Spatial Data
Data structures are complex for GIS because they must include information pertaining to entities with respect to: position, topological relationships, and attribute information. It is the topologic and spatial aspects of GIS that distinguish it from other types of data bases.
1. Introduction: There are presently three types of representations for geographic data: raster vector, and objects.
2. Raster Data model
Definition: realization of the external model which sees the world as a continuously varying surface (field) through the use of 2-D Cartesian arrays forming sets of thematic layers. Space is discretized into a set of connected two dimensional units called a tessellation.
- Each overlay is a 2-D matrix of points carrying the value of a single attribute.
- Each point is represented by a vertical array in which each array position carries a value of the attribute associated with the overlay.
- Map file - each mapping unit has the coordinates for cell in which it occurs (greater structure, many to one relationship).
Vertical array not conducive to compact data coding because it references different entities in sequence and it lacks many to one relationship. The third structure references a set of points for a region (or mapping unit) and allows for compaction.
+ reduced storage.
+ area, perimeter, shape est.
- overlay difficult.
+ reduce storage.
- overlay difficult.
+ reduced storage.
+ U & I of regions easy.
+ reduced storage.
+ variable resolution.
+ overlay of variable resolution data.
+ fast search.Morton Sequencing Overlay
Morton Homework
3. Vector data model
Definition: realization of the discrete model of real world using structures for storing and relating points, lines and polygons in sets of thematic layers.
a. Introduction
b. Representation
3. Area Entity: data structures for storing regions. Data types, land cover, soils, geology, land tenure, census tract, etc.
d. Dime Files (Dual Independent Mapping and Encoding)
e. Arc/node
Definition: realization of the discrete model of real world using an object centered approach in which an object has both physical (attribute) and geometric characteristics. Different types of objects can interact because they are not confined to separate layers.
The biggest single difference between the object-oriented conceptual model and the vector-layered based conceptual model, for representing geographic information, is that in the object model, the real world object is the basis for abstraction, not its geometry. In other words, the objects not the geometric components of layers are the "units" for modeling and interactions